Topic Representation of Researchers' Interests in a Large-Scale Academic Database and Its Application to Author Disambiguation
نویسندگان
چکیده
It is crucial to promote interdisciplinary research and recommend collaborators from different research fields via academic database analysis. This paper addresses a problem to characterize researchers’ interests with a set of diverse research topics found in a large-scale academic database. Specifically, we first use latent Dirichlet allocation to extract topics as distributions over words from a training dataset. Then, we convert the textual features of a researcher’s publications to topic vectors, and calculate the centroid of these vectors to summarize the researcher’s interest as a single vector. In experiments conducted on CiNii Articles, which is the largest academic database in Japan, we show that the extracted topics reflect the diversity of the research fields in the database. The experiment results also indicate the applicability of the proposed topic representation to the author disambiguation problem. key words: researcher analysis, academic database, topic model, author disambiguation
منابع مشابه
The Crisis of Representation in Azadeh Khanoom and Her Author by Reza Baraheni
The crisis of representation is a topic widely discussed in critique and theory of postmodern literature. This refers to the crises of the present era including the crisis of meaning, the perplexity of contemporary humankind amidst a mass of valid and invalid data, alienation, etc. Literature, as the epitome of human life, is a reflection of these crises in the contemporary era. Azadeh Khanoom ...
متن کاملLocal gradient pattern - A novel feature representation for facial expression recognition
Many researchers adopt Local Binary Pattern for pattern analysis. However, the long histogram created by Local Binary Pattern is not suitable for large-scale facial database. This paper presents a simple facial pattern descriptor for facial expression recognition. Local pattern is computed based on local gradient flow from one side to another side through the center pixel in a 3x3 pixels region...
متن کاملبهبود صحت ابهامزدایی نام نویسنده با استفاده از خوشهبندی تجمّعی
Today, digital libraries are important academic resources including millions of citations and bibliographic essential information such as titles, author's names and location of publications. From the view of knowledge accumulation management, the ability to search fast, accurate, desired contents, has a great importance. The complexity and similarity in these resources cause many challenges and...
متن کامل"Seed+Expand": A validated methodology for creating high quality publication oeuvres of individual researchers
The study of science at the individual micro-level frequently requires the disambiguation of author names. The creation of author’s publication oeuvres involves matching the list of unique author names to names used in publication databases. Despite recent progress in the development of unique author identifiers, e.g., ORCID, VIVO, or DAI, author disambiguation remains a key problem when it com...
متن کاملScaling production and improving efficiency in DEA: an interactive approach
DEA models help a DMU to detect its (in-)efficiency and to improve activities, if necessary. Efficiency is only one economic aim for a decision-maker; however, up- or downsizing might be a second one. Improving efficiency is the main topic in DEA; the long-term strategy towards the right production size should attract our attention as well. Not always the management of a DMU primarily focuses o...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- IEICE Transactions
دوره 99-D شماره
صفحات -
تاریخ انتشار 2016